xiii
Preface
T
he use of next-generation sequencing data analysis is the only analysis that can
make sense of the massive genomic data produced by the high-throughput sequenc-
ing technologies and accumulated in gigabytes and terabytes in our hard drives and cloud
databases. With the presence of computational resources and elegant algorithms for NGS
data analysis, scientists need to know how to master the tools of these analyses to achieve
the goals of their research. Learning NGS data analysis techniques has already become one
of the most important assets that bioinformaticians and biologists must acquire to keep
abreast of the progress in the modern biology and to avail of the genomic technologies and
resources that have become the de facto in bioscience research and applications including
diagnosis, drug and vaccine discovery, medical studies, and the investigations of pathways
that give clues to many biological activities and pathogenicity of diseases.
In the last two decades, the progress of next-generation sequencing has made a strong
positive impact on human life and a forward stride in human civilization. Introduction of
new sequencing technologies revolutionizes the bioscience. As a result, a new field of biol-
ogy called genomics has emerged. Genomics focuses on the composition, structure, func-
tional units, evolution, and manipulation of genomes, and it generates massive amount
of data that need to be ingested and analyzed. As a consequence, bioinformatics has also
emerged as an interdisciplinary field of science to address the specific needs in data acqui-
sition, storage, processing, analysis, and integration of that data into a broad pool to enrich
the genomic research.
This book is designed primarily to be a companion for the researchers and graduate
students who use sequencing data analysis in their research, and it also serves as a text-
book for teachers and students in biology and bioscience. It contains an updated material
in the subject covering most NGS applications and meeting the requirements of a complete
semester course. The reader will find that this book is digging deep in the analysis, pro-
viding both concept and practice to satisfy the exact need of the researchers who seek to
understand and use NGS data reprocessing, genome assembly, variant discovery, gene pro-
filing, epigenetics, and metagenomics. The book does not introduce the analysis pipelines
in a black box as the existing books do, but with the analysis steps, it pervades each topic in
detail to provide the readers with the scientific and technical background that enable them
to conduct the analysis with confidence and understanding.
The book consists of eight chapters. All chapters include real-world worked examples
that demonstrate the steps of the analysis workflow with real data downloadable from the